Semantic PDF Segmentation for Legacy Documents in Technical Documentation

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Indexing of Technical Documentation

This research takes place in an industrial context: the CONTINEW Company. This company ensures the storage and security of critical data and technical documentation. Consequently, it is necessary to organize these documents in order to retrieve quickly critical information. The management of this increasing volume of documents requires document classification which is based on indexing techniqu...

متن کامل

Semantic Web Technologies in Technical Automotive Documentation

RDF is the format of choice to exchange data between software components of a corporate system. That’s why we decided to use it in a recent work at Renault, in the field of technical documentation. The prototype of a new repository for repair and diagnostic information was modeled with OWL. REST web services using RDF as data format were built on this repository, to provide access to improved r...

متن کامل

Learning Semantic Correspondences in Technical Documentation

We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning translational correspondences between text descriptions and grounded representations in the target documentation, such as formal representation of functions or...

متن کامل

Reconstructing Semantic Structures in Technical Documentation with Vector Space Classification

With the increasing popularity of component content management systems, a large part of technical documentation in manufacturing and mechanical engineering is written semantically structured in xml-based information models. Content delivery portals can utilize these information to provide users with advanced retrieval or filtering functions. However, legacy content is often excluded from such g...

متن کامل

Layout and Content Extraction for PDF Documents

Portable document format (PDF) is a common output format for electronic documents. Most PDF documents are untagged and do not have basic high-level document logical structural information, which makes the reuse or modification of the documents difficult. We developed techniques that identified logical components on a PDF document page. The outlines, style attributes and the contents of the logi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Procedia Computer Science

سال: 2018

ISSN: 1877-0509

DOI: 10.1016/j.procs.2018.09.006